06. Forests of Randomized Trees
Random Forests
Random Forests are ensemble prediction algorithms that use both random column and random row selection. Each tree in the ensemble is created as follows:
-
If the number of rows in the training dataset is N, generate the dataset for each constituent tree by choosing N rows at random — but with replacement — from the original data.
-
If there are M columns in the training dataset, pick a number m<<M. At each node, select m columns at random out of the M and use the best split of possible splits on these m columns to split the node. The value of m is held constant during the forest growing. m is known as the
max_features
parameter, and the default value is sqrt(M). -
Grow each tree to the largest extent possible.
For a regression tree model, use the average value of the ensemble of trees’ predictions. For a classification model, use the mode of the ensemble of trees’ predictions.
If you’d like to learn more, check out this paper.
L4 011 HS Random Forests V5